AITopics | surrogate loss

On The Existence of The Adversarial Bayes Classifier

Neural Information Processing SystemsMay-1-2026, 01:50:55 GMT

While it has been the subject of several recent theoretical studies, many important questions related to adversarial robustness are still open.

Add feedback

Optimized Deferral for Imbalanced Settings

Cortes, Corinna, Mao, Anqi, Mohri, Mehryar, Zhong, Yutao

arXiv.org Machine LearningMay-1-2026

Learning algorithms can be significantly improved by routing complex or uncertain inputs to specialized experts, balancing accuracy with computational cost. This approach, known as learning to defer, is essential in domains like natural language generation, medical diagnosis, and computer vision, where an effective deferral can reduce errors at low extra resource consumption. However, the two-stage learning to defer setting, which leverages existing predictors such as a collection of LLMs or other classifiers, often faces challenges due to an expert imbalance problem. This imbalance can lead to suboptimal performance, with deferral algorithms favoring the majority expert. We present a comprehensive study of two-stage learning to defer in expert imbalance settings. We cast the deferral loss optimization as a novel cost-sensitive learning problem over the input-expert domain. We derive new margin-based loss functions and guarantees tailored to this setting, and develop novel algorithms for cost-sensitive learning. Leveraging these results, we design principled deferral algorithms, MILD (Margin-based Imbalanced Learning to Defer), specifically suited for expert imbalance settings. Extensive experiments demonstrate the effectiveness of our approach, showing clear improvements over existing baselines on both image classification and real-world Large Language Model (LLM) routing tasks.

large language model, machine learning, natural language, (16 more...)

arXiv.org Machine Learning

2604.27723

Country: North America (0.46)

Genre: Research Report (0.40)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

The Adversarial Consistency of Surrogate Risks for Binary Classification

Neural Information Processing SystemsApr-28-2026, 20:00:16 GMT

We study the consistency of surrogate risks for robust binary classification. It is common to learn robust classifiers by adversarial training, which seeks to minimize the expected 0-1 loss when each example can be maliciously corrupted within a small ball. We give a simple and complete characterization of the set of surrogate loss functions that are consistent, i.e., that can replace the 0-1loss without affecting the minimizing sequences of the original adversarial risk, for any data distribution. We also prove a quantitative version of adversarial consistency for the ρ-margin loss. Our results reveal that the class of adversarially consistent surrogates is substantially smaller than in the standard setting, where many common surrogates are known to be consistent.

artificial intelligence, consistency, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Calibration and Consistency of Adversarial Surrogate Losses

Neural Information Processing SystemsApr-25-2026, 21:41:53 GMT

Adversarial robustness is an increasingly critical property of classifiers in applications. The design of robust algorithms relies on surrogate losses since the optimization of the adversarial loss with most hypothesis sets is NP-hard. But, which surrogate losses should be used and when do they benefit from theoretical guarantees? We present an extensive study of this question, including a detailed analysis of the H-calibration and H-consistency of adversarial surrogate losses. We show that convex loss functions, or the supremum-based convex losses often used in applications, are not H-calibrated for common hypothesis sets used in machine learning.

artificial intelligence, hypothesis, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.66)

Add feedback

514a70448c235ccb8b6842ef5e02ad3b-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 21:41:50 GMT

artificial intelligence, machine learning, surrogate loss, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

3dfe2f633108d604df160cd1b01710db-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 13:48:05 GMT

artificial intelligence, machine learning, storchastic, (15 more...)

Neural Information Processing Systems

Country:

Europe (0.93)
North America > United States > California (0.14)
North America > United States > New York (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

0b17d256cf1fe1cc084922a8c6b565b7-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 15:50:06 GMT

artificial intelligence, machine learning, surrogate loss, (14 more...)

Neural Information Processing Systems

Country: North America (0.45)

Genre: Research Report (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Two-Stage Learning to Defer with Multiple Experts

Neural Information Processing SystemsApr-24-2026, 15:50:02 GMT

We study a two-stage scenario for learning to defer with multiple experts, which is crucial in practice for many applications. In this scenario, a predictor is derived in a first stage by training with a common loss function such as cross-entropy. In the second stage, a deferral function is learned to assign the most suitable expert to each input. We design a new family of surrogate loss functions for this scenario both in the score-based and the predictor-rejector settings and prove that they are supported by H-consistency bounds, which implies their Bayes-consistency. Moreover, we show that, for a constant cost function, our two-stage surrogate losses are realizable H-consistent. While the main focus of this work is a theoretical analysis, we also report the results of several experiments on CIFAR-10 and SVHN datasets.

artificial intelligence, machine learning, surrogate loss, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Contents of Appendix

Neural Information Processing SystemsApr-24-2026, 09:15:59 GMT

Bayes-consistency only holds for the full family of measurable functions, which of course is distinct from the more restricted hypothesis set used by a learning algorithm. Therefore, a hypothesis setdependent notion of H-consistency has been proposed by Long and Servedio (2013) in the realizable setting, used by Zhang and Agarwal (2020) for linear models, and generalized by Kuznetsov et al. (2014) to the structured prediction case. Long and Servedio (2013) showed that there exists a case where a Bayes-consistent loss is not H-consistent while inconsistent losses can be H-consistent. Zhang and Agarwal (2020) further investigated the phenomenon in (Long and Servedio, 2013) and showed that the situation of losses that are not H-consistent with linear models can be remedied by carefully choosing a larger piecewise linear hypothesis set. Kuznetsov et al. (2014) proved positive results for the H-consistency of several multi-class ensemble algorithms, as an extension of H-consistency results in (Long and Servedio, 2013). Recently, the notions of H-calibration and H-consistency have been used by Bao et al. (2020); Awasthi et al. (2021a) in the study of adversarial binary classification losses, as defined in (Goodfellow et al., 2014; Madry et al., 2017; Tsipras et al., 2018; Carlini and Wagner, 2017; Awasthi et al., 2023).

artificial intelligence, machine learning, ymax, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback